Activation Function
In Sigmoid Activation Function we get value between 0 - 1
Derivative ranges between 0 - 0.25
It gives value ranging beween -1 to 1
Derivative of tanh ranges between 0 to 1
In Sigmoid and Tanh - Vanishing Gradient Problem rises , Therefore to avoid that we use RELU
Its value is given by max(z,0)
Derivative of RELU -- Z>0 = 1 , Z<0 = 0
RELU is mostly used in Hidden Layer because it solves Vanishing Gradient problem.
Now there is a problem in RELU --- When we subsitute value as 0 (z<0 by Derivative of Relu) in Chain Rule --Weights updation there is no change in New and old weights . To overcome that problem we use LEAKY RELU
In Leaky RELU , Small constant is multiplied with x.So that value never be 0 in Derivative and there will be some change in new and old weights value when we mulitply it be constant as used below - 0.01
Smaller lighter value is preresent when z < 0 in Derivative of LEAKY RELU
When z < 0 == Instead of 0 we will subsitute it as 0.01 (In derivative of LEAKY RELU formula - 0.01 is used for 0).So these will be very very small value .But when you substitute it in Main Weights updation formula --- There will be no change in old and new weights which leads to Vanishing gradient Problem
To overcome problems of Leaky RELU --- ELU activation funcion is used.
Whenever z/x value is greater than 0 -- max (0,x) , But when x/z value is < 0 , Then we will handle negative value in efficient way , Alpha is Learning Parameter - Hyper parameter
So whenever we find Derivative of these function , we will get value in below structure . It is handling some range of earlier negative value with some greater value which keeps on decreases when negative value increases
Only disadvantage of these activation function is that it takes more time as compared to RELU , LEAKY RELU etc
It is similar to RELU
Z > 0 --- MAX(Z,0) -- Z \ Z < 0 --- Alpha * Z
Alpha is parametric -- Learning Parameter -- Hyper parameter
Formula --- y = z * sigmoid (z) ----- These is called as SELF GATING --- Mostly used in LSTM .
These is computationally expensive
z = Summation of weights*inputs + bias
It solves Dead Activation function which we face in RELU
Log is used to Handle the negative values i-e when z < 0 , Positive values are also handled over here
So instead of applying max(0,x) . It will use these
Sigmoid Actiation Function -- Used when we have Binary classification
60 % it is saying DOG , 40 % it is saying cat .
Now what we do is when value is > 50 --- We will consider that and that will be o/p
Softmax Activation function is basically used when we have many categories to be classified in output layer \ Suppose we have following Neural Network Architecture.
Whenever we have more than 2 ouput categories we use SOFTMAX
If we pass Image , we want to see - whether it belongs to cat,dog,monkey,rat
We are softmax -- For each we will see output
We use these formula
x in below figure is weights*inputs + bias
Now Assume before applying Softmax function we get these values
[40,30,10,5]
Now when we apply softmax formula we will get the final value as shown below
So we substitute xj value from 1 to 4 index of [40,30,10,5]
First 40 substituted we get 0.61 , when we substitute we get 0.352 and so on for 10 and 5
Final value will be the Highest value -- 0.61
Sigmoid and Softmax are always kept in LAST i-e OUTPUT LAYER